Multi-Word Unit Dependency Forest-based Translation Rule Extraction
نویسندگان
چکیده
Translation requires non-isomorphic transformation from the source to the target. However, non-isomorphism can be reduced by learning multi-word units (MWUs). We present a novel way of representating sentence structure based on MWUs, which are not necessarily continuous word sequences. Our proposed method builds a simpler structure of MWUs than words using words as vertices of a dependency structure. Unlike previous studies, we collect many alternative structures in a packed forest. As an application of our proposed method, we extract translation rules in form of a source MWU-forest to the target string, and verify the rule coverage empirically. As a consequence, we improve the rule coverage compare to a previous work, while retaining the linear asymptotic complexity.
منابع مشابه
Multi-task Learning for Word Alignment and Dependency Parsing
Word alignment and parsing are two important components for syntax based machine translation. The inconsistent models for alignment and parsing caused problems during translation pair extraction. In this paper, we do word alignment and dependency parsing in a multi-task learning framework, in which word alignment and dependency parsing are consistent and assisted with each other. Our experiment...
متن کاملBetter Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules
This paper presents a novel filtration criterion to restrict the rule extraction for the hierarchical phrase-based translation model, where a bilingual but relaxed wellformed dependency restriction is used to filter out bad rules. Furthermore, a new feature which describes the regularity that the source/target dependency edge triggers the target/source word is also proposed. Experimental result...
متن کاملA Hybrid Machine Translation System Based on a Monotone Decoder
In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...
متن کاملFeature extraction in opinion mining through Persian reviews
Opinion mining deals with an analysis of user reviews for extracting their opinions, sentiments and demands in a specific area, which can play an important role in making major decisions in such area. In general, opinion mining extracts user reviews at three levels of document, sentence and feature. Opinion mining at the feature level is taken into consideration more than the other two levels d...
متن کاملThe Induction and Evaluation of Word Order Rules using Corpora based on the Two Concepts of Topological Models
Using dependency trees in natural language generation and machine translation raise the need to derive the word order from dependency trees. This task is difficult for languages with (partly) free word order and comparatively easier for languages with fixed word order. This paper describe (a) the two basic elements of topological models, (b) rule patterns for the mapping of dependency trees to ...
متن کامل